123 research outputs found

    Towards automatic classification within the ChEBI ontology

    Get PDF
    *Background*
Appearing in a wide variety of contexts, biochemical 'small molecules' are a core element of biomedical data. Chemical ontologies, which provide stable identifiers and a shared vocabulary for use in referring to such biochemical small molecules, are crucial to enable the interoperation of such data. One such chemical ontology is ChEBI (Chemical Entities of Biological Interest), a candidate member ontology of the OBO Foundry. ChEBI is a publicly available, manually annotated database of chemical entities and contains around 18000 annotated entities as of the last release (May 2009). ChEBI provides stable unique identifiers for chemical entities; a controlled vocabulary in the form of recommended names (which are unique and unambiguous), common synonyms, and systematic chemical names; cross-references to other databases; and a structural and role-based classification within the ontology. ChEBI is widely used for annotation of chemicals within biological databases, text-mining, and data integration. ChEBI can be accessed online at "http://www.ebi.ac.uk/chebi/":http://www.ebi.ac.uk/chebi/ and the full dataset is available for download in various formats including SDF and OBO.

*Automated Classification*
The selection of chemical entities for inclusion in the ChEBI database is user-driven. As the use of ChEBI has grown, so too has the backlog of user-requested entries. Inevitably, the annotation backlog creates a bottleneck, and to speed up the annotation process, ChEBI has recently released a submission tool which allows community submissions of chemical entities, groups, and classes. However, classification of chemical entities within the ontology is a difficult and niche activity, and it is unlikely that the community as a whole will be able or willing to correctly and consistently classify each submitted entity, creating required classes where they are missing. As a result, it is likely that while the size of the database grows, the ontological classification will become less sophisticated, unless the classification of new entities is assisted computationally. In addition, the ChEBI database is expecting substantial size growth in the next year, so automatic classification, which has up till now not been possible, is urgently required. Automatic classification would also enable the ChEBI ontology classes to be applied to other compound databases such as PubChem. 

*Description Logic Reasoning*
Description logic based reasoning technology is a prime candidate for development of such an automatic classification system as it allows the rules of the classification system to be encoded within the knowledgebase. Already at 18000 entities, ChEBI is a fair size for a real-world application of description logic reasoning technology, and as the ontology is enhanced with a richer density of asserted relationships, the classification will become more complex and challenging. We have successfully tested a description logic-based classification of chemical entities based on specified structural properties using the hypertableaux-based HermiT reasoner, and found it to be sufficiently efficient to be feasible for use in a production environment on a database of the size that ChEBI is now. However, much work still remains to enrich the ChEBI knowledgebase itself with the properties needed to provide the formal class definitions for use in the automated classification, and to assess the efficiency of the available description logic reasoning technology on a database the size of ChEBI's forecast future growth.

*Acknowledgements*
ChEBI is funded by the European Commission under SLING, grant agreement number 226073 (Integrating Activity) within Research Infrastructures of the FP7 Capacities Specific Programme, and by the BBSRC, grant agreement number BB/G022747/1 within the “Bioinformatics and biological resources” fund

    ChEBI, an Open-access Chemistry Resource for the Life Sciences: Facilities for On-line Submission and Curation

    Get PDF
    ChEBI (Chemical Entities of Biological Interest) is a database of ‘small’ molecular entities structured around a chemical ontology. It contains almost 600,000 entries, of which approximately 20,000 have been manually curated, as well as entries for groups (parts of molecular entities) and classes of entities. It provides a wide range of information such as chemical nomenclature, structures and related chemical values, and establishes interrelationships between entities in the ontology, in terms of both structure and role. ChEBI places a strong focus on quality, with exceptional efforts being applied to upholding IUPAC nomenclature recommendations and best IUPAC practices when drawing chemical structures. 

To invite the community to participate more directly in the future growth and development of ChEBI, we have developed a web-based software utility to enable direct user submissions. Users are encouraged to carry out as much of their own manual curation as possible, e.g. by adding multiple synonyms and database cross-references, and by creating multiple relationships within the ontology. The submissions are automatically validated for uniqueness (both of name and chemical structure) and correctness (such as checking that no non-allowed cycles have inadvertently been created in the ontology graph structure, and that the ontology relationships which have been specified are allowed between entities of the relevant types). Once a submission has passed the required validations, it is submitted to the ChEBI database, at which time it receives its unique ChEBI identifier. It will then become visible to the public (as a preliminary entry) as part of the monthly ChEBI release. To date, ChEBI has received over 750 such external submissions.
&#xa

    Epitopes in ChEBI - A Collaboration with the IEDB

    Get PDF
    *ChEBI background:* Chemical Entities of Biological Interest (ChEBI) is a curated database of small chemical entities important in biosystems. As well as a description of entities, it provides a semantically rich knowledge base; and an internal hierarchy that organises the entities by their molecular structure types and potential rôles.

*The ChEBI-IEDB collaboration:* The Immune Epitope and Analysis Resource (IEDB) is a project supported by contract from the National Institute of Allergy and Infectious Diseases (NIAID). Its goal is to make epitope-related data on infectious diseases and immune disorders freely available to researchers worldwide. In June 2009, ChEBI began working with the IEDB on a project aimed at incorporating into ChEBI, by manual curation, a pilot subset of immunologically important chemicals identified as immune epitopes.

*The significance of the project:* Numerous reports attest to an increasing global prevalence of immune-related diseases, with a multiplicity of contributing factors. This situation underscores the need for cross-talk among the various scientific disciplines, and makes ChEBI involvement in this project particularly relevant. 

*Collaboration outcome:* That collaboration among curators working on different databases can be reciprocally beneficial has been amply demonstrated by the ChEBI-IEDB teamwork described: while the incorporated IEDB items have substantially enriched ChEBI, the latter’s multiplicity of synonyms, structure tree lay-out and expertise in describing non-peptidic epitopes have been equally useful to the IEDB in facilitating the search process.
*Status quo and plans:* We continue to refine our task of assisting the identification, understanding and utilisation of biologically meaningful chemical entities by engaging in further joint projects

    OntoQuery: easy-to-use web-based OWL querying

    Get PDF
    Summary: The Web Ontology Language (OWL) provides a sophisticated language for building complex domain ontologies and is widely used in bio-ontologies such as the Gene Ontology. The ProtĂ©gĂ©-OWL ontology editing tool provides a query facility that allows composition and execution of queries with the human-readable Manchester OWL syntax, with syntax checking and entity label lookup. No equivalent query facility such as the ProtĂ©gĂ© Description Logics (DL) query yet exists in web form. However, many users interact with bio-ontologies such as chemical entities of biological interest and the Gene Ontology using their online Web sites, within which DL-based querying functionality is not available. To address this gap, we introduce the OntoQuery web-based query utility. Availability and implementation: The source code for this implementation together with instructions for installation is available at http://github.com/IlincaTudose/OntoQuery. OntoQuery software is fully compatible with all OWL-based ontologies and is available for download (CC-0 license). The ChEBI installation, ChEBI OntoQuery, is available at http://www.ebi.ac.uk/chebi/tools/ontoquery. Contact: [email protected]

    The influence of aryl-aryl interactions in the photochemistry of some 1,3-Diarylpropanes

    Get PDF
    The irradiation of 1,3-diarylpropanols in acidic methanol results in their conversion to the corresponding methyl ethers. This reaction and that of the photodechlorination of some 1,3-diarylpropanes is influenced by the presence of electron donating substituents in the aryl group remote from the reactive site

    ChEBI in 2016: Improved services and an expanding collection of metabolites

    Get PDF
    ChEBI is a database and ontology containing infor-mation about chemical entities of biological inter-est. It currently includes over 46 000 entries, each of which is classified within the ontology and assigned multiple annotations including (where relevant) a chemical structure, database cross-references, syn-onyms and literature citations. All content is freely available and can be accessed online a

    Rhea—a manually curated resource of biochemical reactions

    Get PDF
    Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive resource of expert-curated biochemical reactions. Rhea provides a non-redundant set of chemical transformations for use in a broad spectrum of applications, including metabolic network reconstruction and pathway inference. Rhea includes enzyme-catalyzed reactions (covering the IUBMB Enzyme Nomenclature list), transport reactions and spontaneously occurring reactions. Rhea reactions are described using chemical species from the Chemical Entities of Biological Interest ontology (ChEBI) and are stoichiometrically balanced for mass and charge. They are extensively manually curated with links to source literature and other public resources on metabolism including enzyme and pathway databases. This cross-referencing facilitates the mapping and reconciliation of common reactions and compounds between distinct resources, which is a common first step in the reconstruction of genome scale metabolic networks and model

    The FRAXA and FRAXE allele repeat size of boys from the Avon Longitudinal Study of Parents and Children (ALSPAC)

    Get PDF
    The FRAXA and FRAXE alleles of the FMR1 and FMR2 genes located on the X chromosome contain varying numbers of trinucleotide repeats. Large numbers of repeats at FRAXA (full mutations) manifest as Fragile X syndrome, associated with mental impairment that affects males more severely. In this paper, we present the dataset of frequencies of FRAXA and FRAXE repeat size extracted from DNA samples collected from boys enrolled in the Avon Longitudinal Study of Parents and Children (ALSPAC). DNA data were extracted from samples collected in ALSPAC clinics from several types of samples: cord blood, venepuncture blood taken at 43 months, 61 months, seven years or nine years. The DNA was amplified at FRAXA and FRAXE using fluorescent PCR in the Wessex Regional Genetics Laboratory, Salisbury District Hospital. The mean repeat size for FRAXA is 28.92 (S.D. 5.44), the median 30 and the range 8 to 68. There were particularly high numbers of boys with repeat sizes of 20 (10.67%) and 23 (7.35%). The mean repeat size for FRAXE is 17.41 (S.D. 3.94), with median of 16 and range of 0 to 61. There is a relatively high degree of variation of the FRAXA repeat size particularly and we suggest the extensive data available from the ALSPAC study opens up areas of research into understanding phenotypes associated with relatively unexplored repeat sizes. This could be particularly interesting for the lower repeat sizes occurring with high frequency at FRAXA in this population. As the data can be linked to exposures and phenotypes, it will provide a resource for researchers worldwide
    • 

    corecore